Problem 1

Transform a text input into a matrix. The chosen method is word embedding using the word2vec algorithm and continuous bag of words (CBOW) model.

Import models for word embedding, principal component analysis and plotting

Read data from text file

Preprocess raw text

This stage transform the paragraphs into sentences. The sentences are splitted into words, all words in lowercase.

Analysis

The word2vec algorithm is applied to the sentences. A sample from the resulting matrix is shown.

Present the PCA from the resulting matrix